Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid deployments #563

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Hybrid deployments #563

wants to merge 2 commits into from

Conversation

radez
Copy link
Collaborator

@radez radez commented Oct 3, 2024

Baremetal control plane and virtual workers.
I think this will also do baremetal control plane and workers and virtual workers.

@radez
Copy link
Collaborator Author

radez commented Oct 3, 2024

Working on testing this. I think it "works" but I wanted some feedback of what I'm missing or not implementing correctly.

- name: Libvirt - Pause for power down
pause:
seconds: 1
when: not redfish_forceoff.failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to use a "check for powered down" type of task here instead of a sleep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove this entirely on my tested deployments (3, 27, and 54 VMs)

@radez radez force-pushed the hybrid branch 2 times, most recently from 1731073 to 2d70df6 Compare October 8, 2024 12:58
@radez radez self-assigned this Oct 8, 2024
@radez
Copy link
Collaborator Author

radez commented Oct 8, 2024

re: issue #536

@radez radez linked an issue Oct 8, 2024 that may be closed by this pull request
@radez radez force-pushed the hybrid branch 2 times, most recently from faee549 to 9e0ea2c Compare October 15, 2024 19:27
Copy link

openshift-ci bot commented Nov 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: radez

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +27 to +30
- role: boot-iso
vars:
inventory_group: hv_vm
index: "{{ hybrid_worker_count }}"
Copy link
Member

@akrzos akrzos Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of someone running an ACM scale test where all of the hv_vm entries are actually say SNOs, does this task dump thousands of lines of skipped tasks or does it just skip the role? If it dumps thousands of lines, I think we should revisit how this is performed perhaps using a different inventory group.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it does dump thousands of lines. I had looked at adding a loop_var in this at one point to make the output more meaningful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could instead of including another boot-iso role here over hv_vm workers, maybe we just copy the desired number of hv_vm we want to use under workers instead. I will think of a more automated way to accomplish this as well. WDYT?

radez added 2 commits January 16, 2025 12:23
- Specify Podman as the deploy type for the bastion AI container
example podman configmap: https://github.com/openshift/assisted-service/blob/master/deploy/podman/configmap.yml

- no need to patch the cluster network settings after boot
All the same settings are defined at cluster creation
Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to start building some clusters where my controlplane nodes were bare metal and my worker nodes were VMs however I have some feedback.


- name: Libvirt - Check for Virtual Media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change Managers to Systems because Managers didn't work 🤣


- name: Libvirt - Eject any CD Virtual Media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.EjectMedia"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with Managers


- name: Libvirt - Insert virtual media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with Managers

- name: Libvirt - Pause for power down
pause:
seconds: 1
when: not redfish_forceoff.failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove this entirely on my tested deployments (3, 27, and 54 VMs)

Comment on lines +62 to +65
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}",
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"network_type": "{{ networktype }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to revert this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the root of the original issue that was causing my deployment with this to fail PR. We can leave this as is.

Comment on lines -58 to -95
- name: Patch cluster network settings
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_networks": [
{
"cidr": "{{ cluster_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
"host_prefix": "{{ cluster_network_host_prefix }}"
}
],
"service_networks": [
{
"cidr": "{{ service_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
}
]
}

- name: Patch cluster ingress/api vip addresses
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_network_host_prefix": "{{ cluster_network_host_prefix }}",
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}",
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"network_type": "{{ networktype }}"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be reverted as I was unable to make a cluster without including this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok from my findings, we need to just retain lines 58 through 79 so we set cluster_networks and service_networks. I will investigate if we can set these on cluster creation as a follow up.

Comment on lines +27 to +30
- role: boot-iso
vars:
inventory_group: hv_vm
index: "{{ hybrid_worker_count }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could instead of including another boot-iso role here over hv_vm workers, maybe we just copy the desired number of hv_vm we want to use under workers instead. I will think of a more automated way to accomplish this as well. WDYT?

Comment on lines -58 to -95
- name: Patch cluster network settings
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_networks": [
{
"cidr": "{{ cluster_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
"host_prefix": "{{ cluster_network_host_prefix }}"
}
],
"service_networks": [
{
"cidr": "{{ service_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
}
]
}

- name: Patch cluster ingress/api vip addresses
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_network_host_prefix": "{{ cluster_network_host_prefix }}",
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}",
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"network_type": "{{ networktype }}"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok from my findings, we need to just retain lines 58 through 79 so we set cluster_networks and service_networks. I will investigate if we can set these on cluster creation as a follow up.

Comment on lines +62 to +65
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}",
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"network_type": "{{ networktype }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the root of the original issue that was causing my deployment with this to fail PR. We can leave this as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hypervisor support for MNO clusters
2 participants